Unsupervised Learning of Stereo Vision with Monocular Cues

نویسندگان

  • Hoang Trinh
  • David McAllester
چکیده

We demonstrate unsupervised learning of a stereo vision model involving monocular depth cues (shape from texture cues). We formulate a conditional probability model defining the probability of the right image given the left. This conditional model does not model a probability distribution over images. Maximizing conditional liklihood rather than joint liklihood is similar using a CRF (Conditional Random Field, [6]) rather than an MRF (joint Markov Random Field). The most closely related earlier work seems to be that of Zhang and Seitz [8] who give a method for adapting five parameters of a stereo vision model. In contrast we train highly parameterized monocular depth cues. Also, we avoid the need for independence assumptions through the use of contrastive divergence training — a general method for optimizing CRFs [4]. There is also related work by Saxena et al. on supervised learning of highly parameterized monocular depth cues [1, 2]. Unlike Saxena et al. we train monocular depth cues as part of unsupervised training of a stereo algorithm. Other related work includes that of Scharstein and Pal [7] and Kong and Tao [5] who perform supervised training of stereo algorithms using general CRF methods. We focus on histogram of oriented gradient (HOG) features as a (texture) surface orientation cue. As a surface is tilted away from the camera the edges in the direction of the tilt become foreshortened while the edges orthogonal to the tilt are not. The effect on the edge distribution is shown in the image below where the average HOG feature is shown for regions of tree trunk and forest floor. The cylindrical shape of the tree trunk is clearly indicated by the warping of the HOG feature.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Machine Learning Approach to Recovery of Scene Geometry from Images

Recovering the 3D structure of the scene from images yields useful information for tasks such as shape and scene recognition, object detection, or motion planning and object grasping in robotics. In this thesis, we introduce a general machine learning approach called unsupervised CRF learning based on maximizing the conditional likelihood. We describe the application of our machine learning app...

متن کامل

Depth Estimation Using Monocular and Stereo Cues

Depth estimation in computer vision and robotics is most commonly done via stereo vision (stereopsis), in which images from two cameras are used to triangulate and estimate distances. However, there are also numerous monocular visual cues— such as texture variations and gradients, defocus, color/haze, etc.—that have heretofore been little exploited in such systems. Some of these cues apply even...

متن کامل

Persistent self-supervised learning principle: from stereo to monocular vision for obstacle avoidance

Self-Supervised Learning (SSL) is a reliable learning mechanism in which a robot uses an original, trusted sensor cue for training to recognize an additional, complementary sensor cue. We study for the first time in SSL how a robot’s learning behavior should be organized, so that the robot can keep performing its task in the case that the original cue becomes unavailable. We study this persiste...

متن کامل

Recovering stereo vision by squashing virtual bugs in a virtual reality environment.

Stereopsis is the rich impression of three-dimensionality, based on binocular disparity-the differences between the two retinal images of the same world. However, a substantial proportion of the population is stereo-deficient, and relies mostly on monocular cues to judge the relative depth or distance of objects in the environment. Here we trained adults who were stereo blind or stereo-deficien...

متن کامل

Extracting 3D Scene-Consistent Object Proposals and Depth from Stereo Images

This work combines two active areas of research in computer vision: unsupervised object extraction from a single image, and depth estimation from a stereo image pair. A recent, successful trend in unsupervised object extraction is to exploit so-called “3D scene-consistency”, that is enforcing that objects obey underlying physical constraints of the 3D scene, such as occupancy of 3D space and gr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009